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Abstract 

In this paper, we characterize by lexicographic order all finite Sturmian and epistur- 
mian words, i.e., all (finite) factors of such infinite words. Consequently, we obtain 
a characterization of infinite episturmian words in a wide sense (episturmian and 
episkew infinite words). That is, we characterize the set of all infinite words whose 
factors are (finite) episturmian. Similarly, we characterize by lexicographic order all 
balanced infinite words over a 2-letter alphabet; in other words, all Sturmian and 
skew infinite words, the factors of which are (finite) Sturmian. 
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1 Introduction 

The family of episturmian words is an interesting natural generalization of 
the well-known Sturmian words (a particular class of binary infinite words) 
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to an arbitrary finite alphabet, introduced by Droubay, Justin, and Pirillo [5] 
(also see [8,13,15,16] for instance). Episturmian words share many properties 
with Sturmian words and include the well-known Arnoux-Rauzy sequences, the 
study of which began in [2] (also see [14,24] for example). 

In this paper, we characterize by lexicographic order all finite Sturmian and 
episturmian words, i.e., all (finite) factors of such infinite words. Consequently, 
we obtain a characterization of episturmian words in a wide sense (epistur- 
mian and episkew infinite words). That is, we characterize the set of all infinite 
words whose factors are (finite) episturmian. Similarly, we characterize by lex- 
icographic order all balanced infinite words over a 2- letter alphabet; in other 
words, all Sturmian and skew infinite words, the factors of which are (finite) 
Sturmian. 

To any infinite word t we can associate two infinite words min(t) and max(t) 
such that any prefix of min(t) (resp. max(t)) is the lexicographically smallest 
(resp. greatest) amongst the factors of t of the same length (see [20] or Section 
2.1). Our main results in this paper extend recent work by Pirillo [20,21], Justin 
and Pirillo [14], and Glen [9]. In the first of these papers, Pirillo proved that, 
for infinite words s on a 2-letter alphabet {a, b} with a < b, the inequality 
as < min(s) < max(s) < bs characterizes standard Sturmian words (both 
aperiodic and periodic). Similarly, an infinite word s on a finite alphabet A 
is standard episturmian if and only if, for any letter a G A and lexicographic 
order < satisfying a = min(A), we have 



Moreover, s is a strict standard episturmian word (i.e., a standard Arnoux- 
Rauzy sequence [2,24]) if and only if (1) holds with strict equality [14]. In 
a similar spirit, Pirillo [21] very recently defined fine words over two letters; 
that is, an infinite word t over a 2-letter alphabet {a, b} (a < b) is said to 
be fine if (min(t), max(t)) = (as, bs) for some infinite word s. These words 
are characterized in [21] by showing that fine words on {a, b} are exactly the 
aperiodic Sturmian and skew infinite words. 

Glen [9] recently extended Pirillo's definition of fine words to an arbitrary 
finite alphabet; that is, an infinite word t is fine if there exists an infinite 
word s such that min(t) = as for any letter a G Alph(t) and lexicographic 
order < satisfying a = min(Alph(t)). (Here, Alph(t) denotes the alphabet of 
t, i.e., the set of distinct letters occurring in t.) These generalized fine words 
are characterized in [9]; specifically, it is shown that an infinite word t is fine 
if and only if t is either a strict episturmian word, or a strict episkew word 
(i.e., a particular kind of non- recurrent infinite word, all of whose factors are 
episturmian). Here, we prove further that an infinite word t is episturmian in 
the wide sense (episturmian or episkew) if and only if there exists an infinite 
word u such that au < min(t) for any letter a G A and lexicographic order < 
satisfying a = min(^4). This result follows easily from our characterization of 
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finite episturmian words in Section 4. 

This paper is organized as follows. Section 2 contains all of the necessary 
terminology and notation concerning words, morphisms, and Sturmian and 
episturmian words. In Section 3, we give a number of equivalent definitions of 
episkew words, and recall the aforementioned characterizations of fine words. 
Then, in Section 4, we prove a new characterization of finite episturmian words, 
from which a new characterization of finite Sturmian words is an easy conse- 
quence. Lastly, in Section 5, we obtain characterizations of episturmian words 
in the wide sense and balanced binary infinite words, which follow from the 
main results in Sections 3 and 4. 



2 Preliminaries 

2.1 Words and morphisms 

Let A denote a finite alphabet. A (finite) word is an element of the free 
monoid A* generated by A, in the sense of concatenation. The identity e of 
A* is called the empty word, and the free semigroup, denoted by A + , is defined 
by A + := A* \ {e}. An infinite word (or simply sequence) x is a sequence 
indexed by N with values in A, i.e., x = x xix 2 ■ ■ ■ , where each Xi G A. The 
set of all infinite words over A is denoted by A u , and we define A°° := A* UA^. 

If w = X\Xi ■ ■ ■ x m G A + , each Xi G A, the length of w is \w\ = m and we 
denote by \ w\ a the number of occurrences of a letter a'mw. (Note that \e\ =0.) 
The reversal w of w is given by w = x m x m -\ ■ ■ ■ x\, and if w — w, then w is 
called a palindrome. 

An infinite word x = xqX±X2 ■ ■ ■ , each Xj G A, is said to be periodic (resp. ul- 
timately periodic) with period p if p is the smallest positive integer such that 
Xi = Xi +P for all z e N (resp. for allz > m for some m G N). If u, v G A + , then 
v u (resp. uv u ) denotes the periodic (resp. ultimately periodic) infinite word 
vvv ■ ■ ■ (resp. uvvv ■ ■ ■) having \v\ as a period. 

A finite word w is a factor of z G A°° if z = uwv for some u G A*, v G -4°°. 
Further, w is called a prefix (resp. suffix) of z if u = e (resp. v = e). 

An infinite word x G .4^ is called a suffix of z G *4 W if there exists a word 
u> G .4* such that z = wx. A factor w of a word z G v4°° is right (resp. left) 
special if wa, wb (resp. aw, few) are factors of z for some letters a, b G A, a ^ b. 

For any word w G *4°°, F(w) denotes the set of all its factors, and F n (w) 
denotes the set of all factors of w of length n G N, i.e., F n {w) := F{w) PI «4. n 
(where \w\ > n for id finite). Moreover, the alphabet of w is Alph(w) := 
F(u>) fl A and, if u> is infinite, we denote by Ult(iu) the set of all letters 
occurring infinitely often in w. Two infinite words x, y G A u are said to be 
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equivalent if F(y) = F(x), i.e., if x and y have the same set of factors. A 
factor of an infinite word x is recurrent in x if it occurs infinitely many times 
in x, and x itself is said to be recurrent if all of its factors are recurrent in it. 

Suppose the alphabet A is totally ordered by the relation <. Then we can 
totally order A + by the lexicographic order <, defined as follows. Given two 
words u, v G A + , we have u < v if and only if either u is a proper prefix of v 
or u = xau' and v = xbv', for some x, u', v' G A* and letters a, b with a < b. 
This is the usual alphabetic ordering in a dictionary, and we say that u is 
lexicographically less than v. This notion naturally extends to A w , as follows. 
Let u = u uiU2 ■ ■ ■ and v = voViv 2 ■ ■ • , where Uj, Vj G A. We define u < v 
if there exists an index i > such that Uj = Vj for all j = 0, . . . , i — 1 and 
Ui < Vi. Naturally, < will mean < or =. 

Let w G *4°° and let k be a positive integer. We denote by mm(w\k) 
(resp. max(w\k)) the lexicographically smallest (resp. greatest) factor of w 
of length k for the given order (where \w\ > k for w finite). If w is infinite, 
then it is clear that mm(w\k) and m&x(w\k) are prefixes of the respective 
words min(w\k + 1) and ma.x(w\k + 1). So we can define, by taking limits, the 
following two infinite words (see [20]) 

min(it)) = lim min(w\k) and max(w) = lim max(w|A;). 

fc— >oo k— >oo 

The inverse of w G A*, written w' 1 , is defined by ww" 1 = w~ 1 w = e. It 
must be emphasized that this is merely formal notation, i.e., for u,v,w G A*, 
the words u~ 1 w and are defined only if u (resp. v) is a prefix (resp. suffix) 
of w. 

A morphism on A is a map ^ : A* — >• ^4* such that ip(uv) = i/j(u)i^(v) 
for all u, v G ^4*. It is uniquely determined by its image on the alphabet A. 
The action of morphisms on A* naturally extends to infinite words; that is, if 
x = x XiX 2 • • • G A u , then ^(x) = tjj(x )il ! (xi)ip(x2) 

In what follows, we shall assume that A contains two or more letters. 

2.2 Sturmian words 

Sturmian words admit several equivalent definitions and have numerous char- 
acterizations; for instance, they can be characterized by their palindrome or 
return word structure [6,16]. A particularly useful definition of Sturmian words 
is the following. 
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Definition 2.1 An infinite word s over {a, b} is Sturmian if there exist real 
numbers a, p G [0, 1] such that s is equal to one of the following two infinite 
words: 

{a,b} 



[na + p\ =0, 

(n > 0) 

\na + p] =0, 

Moreover, s z's saz'c? to be standard Sturmian if p = a. 
Remark 2.2 A Sturmian word of slope a is: 

• aperiodic (i.e., not ultimately periodic) if a is irrational; 

• periodic if a is rational. 

Nowadays, for most authors, only the aperiodic Sturmian words are consid- 
ered to be 'Sturmian'. In several of our previous papers (see [9,12,15,19,21] for 
instance), we have referred to aperiodic Sturmian words as 'proper Sturmian' 
to highlight the fact that such Sturmian words correspond to the most com- 
mon sense of 'Sturmian' now. In the present paper, the term 'Sturmian' will 
refer to both aperiodic and periodic Sturmian words. 

Definition 2.3 A finite or infinite word w over {a, b} is said to be balanced 

if, for any factors u, v of w with \u\ = \v\, we have \\u\b — \v\b\ < 1 (or 
equivalently \\u\ a — Ma| 

In the pioneering paper [18], balanced infinite words over a 2-letter alpha- 
bet are called 'Sturmian trajectories' and belong to three classes: aperiodic 
Sturmian; periodic Sturmian; and non-recurrent infinite words that are ulti- 
mately periodic (but not periodic), called skew words. That is, the family of 
balanced infinite words consists of the (recurrent) Sturmian words and the 
(non-recurrent) skew infinite words, all of whose factors are balanced. 

It is important to note that a finite word is finite Sturmian (i.e., a factor of 
some Sturmian word) if and only if it is balanced [3]. Accordingly, balanced 
infinite words are precisely the infinite words whose factors are finite Stur- 
mian. In Section 5, we will generalize this concept by showing that the set of 
all infinite words whose factors are finite episturmian consists of the (recur- 
rent) episturmian words and the (non-recurrent) episkew infinite words (see 
Propositions 3.1 and 5.2, to follow). 

For a comprehensive introduction to Sturmian words, see for instance [1,3,22] 
and references therein. Also see [10,21] for further work on skew words. 



defined by 

s a, P ( n ) 



s s' • N 



a if [(n + l)a + p\ 

b otherwise] 

a if \{n + I) a + p] 

b otherwise. 
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2.3 Episturmian words 



For episturmian words and morphisms 1 we use the same terminology and 
notation as in [5,13,15]. 

An infinite word t G A u is episturmian if F(t) is closed under reversal and 
t has at most one right (or equivalently left) special factor of each length. 
Moreover, an episturmian word is standard if all of its left special factors are 
prefixes of it. Sturmian words are exactly the episturmian words over a 2- letter 
alphabet. 

Note. Episturmian words are recurrent [5]. 

Standard episturmian words are characterized in [5] using the concept of 
the palindromic right-closure of a finite word w, which is the (unique) 
shortest palindrome having wasa prefix (see [4]). Specifically, an infinite word 
t G A^ is standard episturmian if and only if there exists an infinite word 
A(t) = x\Xix?, ■ ■ ., each Xi G A, called the directive word of t, such that the 
infinite sequence of palindromic prefixes U\ = e, U2, U3, ... of t (which exists 
by results in [5]) is given by 



Note. An equivalent way of constructing the sequence (« n )n>i is via the 'hat 
operation' [24, Section III]. 

Let a E A and denote by ip a the morphism on A defined by 



Together with the permutations of the alphabet, all of the morphisms ip a gen- 
erate by composition the monoid of epistandard morphisms ('epistandard' is 
an elegant shortcut for 'standard episturmian' due to Richomme [23]). The 
submonoid generated by the ifj a only is the monoid of pure epistandard mor- 
phisms, which includes the identity morphism Id^ = Id, and consists of all the 
pure standard (Sturmian) morphisms when \A\ = 2. 

Remark 2.4 If x = il> a (y) or x = a~ 1 ip a (y) for some y G A w and a G A, then 
the letter a is said to be separating for x and its factors; that is, any factor of 
x of length 2 contains the letter a. 

Another useful characterization of standard episturmian words is the follow- 
ing (see [13]). An infinite word t G A u is standard episturmian with directive 
word A(t) = X1X2X3 ■ ■ ■ (x,i G A) if and only if there exists an infinite se- 
quence of infinite words t^ ^ = t, t^ 2 \ . . . such that t^ -1 ^ = ip Xi (t^) for 

1 In [13], Section 5.1 is incorrect and should be ignored. 



w n +i — (u n x n ) 



n <EN 



+ 



(2) 




x 1— > ax for all x G A \ {a}. 
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all i G N + . Moreover, each tW is a standard episturmian word with directive 
word A(t^) = Xi + iXi + 2Xi + 3 ■ ■ • , the i-th shift of A(t). 

To the prefixes of the directive word A(t) = X1X2 ■ ■ ■ , we associate the mor- 
phisms 

yu :=ld, n n := ^ X1 ^ X2 ■■ -ip Xn , n G N+, 
and define the words 

h n := fJL n (x n +i), neN, 

which are clearly prefixes of t. For the palindromic prefixes (wj)i>i given by 
(2), we have the following useful formula [13] 

whence, for n > 1 and < p < n, 

u-n = h n -2h n _3 ■ ■ ■ h\ho = h n _2h n -3 ■ ■ ■ hp-iUp. (3) 

Note. Evidently, if a standard episturmian word t begins with the letter x 6 A, 
then x is separating for t (see [5, Lemma 4]). 



2.3.1 Strict episturmian words 

A standard episturmian word t G A w , or any equivalent (episturmian) word, 
is said to be B- strict (or k-strict if \B\ = k, or strict if B is understood) 
if Alph(A(t)) = Ult(A(t)) = B C A. In particular, a standard episturmian 
word over A is ^4-strict if every letter in A occurs infinitely many times in its 
directive word. The fc-strict episturmian words have complexity (k — l)n + l for 
each n G N; such words are exactly the fc-letter Arnoux-Rauzy sequences. In 
particular, the 2-strict episturmian words correspond to the aperiodic Sturmian 
words. The strict standard episturmian words are precisely the standard (or 
characteristic) Arnoux-Rauzy sequences. 



3 Episkew words 

Recall that a finite word w is said to be finite Sturmian (resp. finite epis- 
turmian) if w is a factor of some infinite Sturmian (resp. episturmian) word. 
When considering factors of infinite episturmian words, it suffices to consider 
only the strict standard ones (i.e., characteristic Arnoux-Rauzy sequences). 
Indeed, for any factor u of an episturmian word, there exists a strict standard 
episturmian word also having factor. Thus, finite episturmian words are 

exactly the finite Arnoux-Rauzy words considered by Mignosi and Zamboni 
[17]. 
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In this section, we define episkew words, which were alluded to (but not 
explicated) in the recent paper [9]. The following proposition gives a number 
of equivalent definitions of such infinite words. 

Notation: Denote by v p the prefix of length p of a given infinite word v. 

Proposition 3.1 An infinite word t with Alph(t) = A is episkew if equiva- 
lently: 

(i) t is non-recurrent and all of its factors are (finite) episturmian; 

(ii) there exists an infinite sequence (t^)j>o of non-recurrent infinite words 
and a directive word Xix 2 x 3 ■ ■ ■ (xj G A) such that = t, . . . , t'^~ l > = 
ip Xi (t {i) ), where t'^ = t^" 1 ) ift {i ~^ begins with x,-, and t' = x^" 1 ) 
otherwise; 

(iii) there exists a letter x G A and a standard episturmian word s on A \ {x} 
such that t = vp,(s), where fi is a pure epistandard morphism on A and 
v is a non-empty suffix of fi(spx) for some p G N. 

Moreover, t is said to be a strict episkew word if s is strict on A \ {x}, i.e., 
if each letter in A \ {x} occurs infinitely often in the directive word X1X2X3 ■ ■ ■ . 

PROOF, (i) =^> (ii): Since all of the factors of t are finite episturmian, there 
exists a letter, x\ say, that is separating for t. If t does not begin with x±, 
consider t' = X\t; otherwise consider t' = t. Then, t' = ip xi (t^) for some 
v 1 ) G A° . Continuing in this way, we obtain infinite words t^ 2 \ t'( 2 ', t'^ 3 \ t^ 3 \ 
. . . with t'^" 1 ) as in the statement. Clearly, if some is recurrent then t is 
also recurrent, in which case t is episturmian by [13, Theorem 3.10]. Thus all 
of the are non-recurrent. 

(ii) =^> (iii): We proceed by induction on |^4| . The starting point of the induction 
(i.e., \A\ =2) will be considered later. 

Let A := X1X2X3 ■ ■ ■ . If A = Ult(A) then any letter in A is separating for 
infinitely many t^, thus is recurrent in all t^. Consider any factor w of t. As 
|Ult(A)| > 1, we easily see that w is a factor of r fjj Xl r ^x 2 ' ' '^ 9 ( a; ) f° r some q 
and letter x. Hence w is recurrent in t and it follows that t itself is recurrent; 
a contradiction. Thus, there exists a letter x in A and some minimal n such 
that x is not recurrent in t^ n \ Two cases are possible: 

Case 1: x does not occur in t^ n ^. Then |Alph(t*™))| < \A\; whence, by induction, 
t*™) has the desired form and clearly t also has the desired form. More precisely, 
if we let B :— A \ {x}, then t^ n ^ = t)A(s) where s is a standard episturmian 
word on B \ {y} for some letter y ^ x, A is a pure epistandard morphism on 
B, and v is a non-empty suffix of \{s q y) for some q G N. It easily follows that 
t = i>/i(s) where s is a standard episturmian word on A \ {y}, \x is a pure 
epistandard morphism on A, and v is a non-empty suffix of /i(s^y) for some 

p eN. 
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Case 2: x occurs in t^. We now show that x occurs exactly once in t'"'. 

Suppose on the contrary that x occurs at least twice in t^ n ^. Then, since 
x n+ \ is separating for t^, we have xw^x G F(t) for some non-empty word 
w ( n ) f or which x n+ i is separating, and the first and last letter of w^ n ' is x n+ i 
(that is, w^x = ^ Xn+1 (-u/ n+1 )x), where u)( n+1 ) = ipxh.i( w ^ x n+i))- Using the 
fact that |z// n )a;| = 2|w/ n+1 )x| — \w^ n+1 ^x\ Xn+1 , we see that |«/ n+1 )| < |u/ n )|. 
So, continuing the above procedure, we obtain infinite words t^ n+1 \ t^ n+2 \ 
. . . containing similar shorter factors xw^ n+1 'x, ™' n+2 'i:, . . . until we reach 
t(i\ which contains xx. But this is impossible because the letter x q+ \ 7^ x is 
separating for t^. Therefore contains only one occurrence of x and we 
have 

t (n) = uxs (n) for some u e (A \ {x})* and e (A \ {x})". 

Now, as x is never separating for j > n, we can write t^ n+ ^ = u^xs^ n+ ^ 
for some ?S n+: >\ and we have s^ n+ ^~^ = ^j Xn+j (s^ n+ ^), j > 0. It follows by 
the Preliminaries (Section 2.3) that is a (recurrent) standard episturmian 
word. 

Now we study the factor u preceding x in t^ n \ Let u' = x n+ iu if u does not 
begin with x n+ \] otherwise let v! = u. Then u'x is a prefix of t l(jl \ Moreover, 
since x n+ \ is separating for u'x, we have u'x = ip Xn+1 {u^x) where vS x > = 
i^xn+i( u ' x n+i) ■ Hence t( n+1 ) = u^xs^ n+1 \ where x n+ 2 is separating for w- x >x 
(if uS 7^ e). Continuing in this way, we arrive at the infinite word = xs^ q ' 
for some q > n, where is a standard episturmian word on A \ {x}. 

Reversing the procedure, we find that 

t(") = ws'"' where w = ux is a non-empty suffix of ip Xn+1 • • • i^x q {x). 
Suppose (ui)i>x is the sequence of palindromic prefixes of 

s = V*i--'^„(s (n) )=Ms (n) ) 5 

and the words (/ii)i>o are the prefixes (jiii(xt+i))i>o of s. Then, letting u\ n \ 
h\ n \ and /4 denote the analogous elements for s^ n \ we have 

and 

/io n) = x n+ i, hf' = (jLf\x n+ i +i ) for i = 1, 2, . . . . 
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Now, if u 7^ e, then q > 1, and we have 

?Ar„+i ' " ' ^Xqix) = fJ'g2 n (x) = jj! q } n _ill) q {x n ) 

(n) / \ 

t^q—n-Vy^q-^n) 

- h {n) n {n) (r) 

"'q-n-lt Jj q—n—l\ Jj / 



= /ij- n -i • • • h^hf'x = u[ n } n+1 x (by (3)). 
Therefore, w = ux where u is a (possibly empty) suffix of the palindromic prefix 



u 



(n) 



of sW. That is, u is the reversal of some prefix of s < - n - ) ; in particular 



u = for some p G N, 

and hence 

f-O) _ g( n ) ;cs ( n ). 
So, passing back from t^ n ^ to t, we find that 

t = t>/i n (s < ' n )) = vs where v is a non-empty suffix of fi n (s^x) 



It remains to treat the case |*4.| =2. Reasoning as previously we see that for 
some n, t^ 1 ' = y v xy u where x ^ y E A; whence the desired form for t. 

(iii) =^> (i): It suffices to show that the factors of SpXS are (finite) episturmian. 
This is trivial for factors not containing the letter x. Suppose w is a factor 
containing x. Then w is a factor of u r xu r where u r is a long enough palindromic 
prefix of s. Thus it remains to show that u r xu r is episturmian and this is 
true because it is (urx)^, which is a palindromic prefix of some standard 
episturmian word. □ 

Remark 3.2 Episkew words on a 2-letter alphabet are precisely the skew 
words, defined in Section 2.2. 



3.1 Fine words 



Definition 3.3 An acceptable pair is a pair (a, <) where a is a letter and 
< is a lexicographic order on A + such that a = min(^4). 

Definition 3.4 [9] An infinite word t on A is said to be fine if there exists 
an infinite word s such that min(t) = as for any acceptable pair (a, <). 

Note. Since there are only two lexicographic orders on words over a 2-letter 
alphabet, a fine word t over {a, b} (a < 6) satisfies (min(t), max(t)) = (as, bs) 
for some infinite word s. 
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Pirillo [21] characterized fine words over a 2-letter alphabet. Specifically: 

Proposition 3.5 Let t be an infinite word over {a, b}. The following proper- 
ties are equivalent: 

(i) t is fine, 

(ii) either t is aperiodic Sturmian, ort = vli{x) w where fx is a pure standard 
Sturmian morphism on {a, b}, and v is a non-empty suffix of Li(x p y) for 
some pGM and x, y G {a, b} (x ^ y). □ 

In other words, a fine word over two letters is either an aperiodic Sturmian 
word or an ultimately periodic (but not periodic) infinite word, all of whose 
factors are Sturmian, i.e., a skew word (see Section 2.2). Recently, Glen [9] 
generalized this result to infinite words over two or more letters; that is, an 
infinite word t is fine if and only if t is either a strict episturmian word or a 
strict episkew word. 



4 A characterization of finite episturmian words 



Let w G A°° and let k be a positive integer. Recall that mm(w\k) 
(resp. max(w\k)) denotes the lexicographically smallest (resp. greatest) fac- 
tor of w of length k for the given order (where \w\ > k for w finite). 

Definition 4.1 For a finite word w G A + and a given order, min(«;) will 
denote mm(w\k) where k is maximal such that all mm(w\j), j = 1,2, ... ,k, 
are prefixes of mm(w\k) . In the case A = {a, b}, max(w) is defined similarly. 

Example 4.2 Suppose w = baabacababac. Then, for the orders b < a < c and 
b < c < a on the 3-letter alphabet {a,b,c}: 



min(w 


1) 


= b 


mm(w 


2) 


= ba 


mm(w 


3) 


= bob 


min(w; 


4) 


= baba 


mm(w 


5) 


= babac 



min(ty) 



Notice that, in the above example, min(w) is a suffix of w; in fact, this inter- 
esting property is true in general, as shown below. 

Proposition 4.3 For any finite word w and a given order, min(ty) is a suffix 
of w. Moreover, mm(w) is unioccurrent (i.e., has only one occurrence) in w. 

PROOF. If min(«;) (= min(u>|/c), say) has an occurrence in w that is not a 
suffix of w, then mm(w\k + 1) = min(w\k)x for some letter x, contradicting 
the maximality of k. Hence min(«;) occurs just once in w as a suffix. □ 

Notation: From now on, it will be convenient to denote by v p the prefix of 
length p of a given finite or infinite word v (where | v | > p for v finite) . 



11 



In this section, we shall prove the following characterization of finite epis- 
turmian words. 



Theorem 4.4 A finite word w on A is episturmian if and only if there exists 
a finite word u such that, for any acceptable pair (a, <), we have 

au\m\-i < m (4) 

where m = min(u>) for the considered order. 

The following two lemmas are needed for the proof of Theorem 4.4. 

Lemma 4.5 If w and u satisfy inequality (4) for all acceptable pairs (a, <) 
and |Alph(iy)| > 1, then u is non-empty and its first letter is separating for w. 

PROOF. Let a ^ b G Alph(u>) and let (a, <), (b, <') be two acceptable pairs. 
As the corresponding two min(z/7)'s are suffixes of w (by Proposition 4.3), they 
have different lengths; whence \u\ > 0. 

Now we show that the first letter u\ of u is separating for w. Indeed, if this 
is not true, then there exist letters z, z' G A\ {u\} (possibly equal) such that 
zz' G F{w). But min(^4) = z < z' < u\ for some acceptable pair (z, <), in 
which case zz' < zu\, contradicting the fact that zu\ < ni2. □ 

Lemma 4.6 Consider w, w' G A* and some letter z G A. For any given order 
< on A: 



(i) if w does not end with z and w = ip z (w') ; then 
min(w) 



ifj z (mm(w')) ifmm(w) begins with z, 
z~ l "ip z (mm.{w')) otherwise] 
(ii) if w ends with z and w = ip z (w')z, then 



mm w 



ip z (mm(w'))z ifmm(w) begins with z, 
z~ l -?p z (mm(w')) z otherwise. 



PROOF. We denote by m, m! the respective words min(w), min(iy'). 

Consider first the simplest case: w does not end with z, m begins with z. 
Thus w = ifj z (w') for some word w' that does not end with z. Write e := ip z {m'). 
We have to show that e — m. Let k be maximal such that ei = mm(w\i) for 
% = l,...,k. Suppose k < |e|. Then there exist x,y G A, x > y, such that 
ek+i = etx and ety G F{w). Thus, as z is separating for w, = ek-iz, with 
ek-i = ip z (m' q ) for some q. Since m begins with z, min(Alph(-u/)) = z and we 
have e k+1 = ip z (m! q x) = ip z (m' q+l ). Also, if y ^ z then e k y = ip z (m' q y) with 
m'y G F{w'). If y = z, then as w does not end with z, e k yd = e k -izyd is 
a factor of w for some letter d; whence again m' q y G F(w'). As x > y, this 
contradicts m' q+1 = mm(w'\q + 1). 
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Thus k = \e\. It suffices now to show that no ex, x G A, occurs in w. 
Otherwise ex G F(w). As m! does not end with z, also e does not end with 
z, thus x = z. So, as w does not end with z, ezy = exy occurs in w for some 
letter y, whence ip z {m'y) G F(w') contradicting the unioccurrence of m' in w' . 

Now we pass to the most complicated case: w ends with z, m does not begin 
with z, w — ip z (w')z. Letting e := z~ l if) z (m')z, we need to show that e = m. 
Let k be maximal such that = min(tt;|z) for % = 1, . . . , k. Suppose k < \e\. 
Then, there exist / G F(w) and x, y G A with x > y, such that e^+i = e^x and 
/ = e^y. As w begins with z, clearly zek+i,zf G F(w). Also ends with z, 
hence zek+i = tf) z (m q )zx and zf = ip z (m' q )zy for some q < \m'\. We distinguish 
three cases: x,y ^ z; x — z; y — z. 

The first case leads to ze^+i = ^) z {rn! q x) and zf = il> z (m q y)] whence m' q+1 > 
m' q y, contradicting the definition of m' . For the case x = z,y 7^ z, let m' = m' q u, 
u G A*, and recall that ze = ^) z {w!^)z. We get ze = ^ z {w! q ^ z {u)z, thus iIj z (u)z 
begins with zz, and so u begins with z. Hence m' q+l = m' q z = m' q x, leading to 
a contradiction as above. The third case is similar. 

Thus k = |e| and it remains to show that no ex, x a letter, occurs in w. 
Consider for instance the CcLSG X — Z. Indeed ez G F(w) implies z l i/j z (m')zz G 
F(ijj z (w')z), so i/j z (m')z G F(ip z {w')), whence m'd G F(w') for some letter d; a 
contradiction. 

The other two cases in the lemma have similar proofs. □ 

Example 4.7 Let us illustrate the most complicated case when w ends with 
z and m does not begin with z. Let w' = aa, z = b, w = babab = ifj^w^b. 
Then m' = aa and m = abab = b~ lr ip b (m')b. 

Proof of Theorem 4.4 ONLY IF part: w is finite episturmian, so is a factor 
of some standard episturmian word s. By [20, Proposition 3.2] or [14, Theorem 
0.1], as < min(s) for any acceptable pair (a, <). Thus, m = mm(w) trivially 
satisfies 

as| m |_i < m; 

that is, with r large enough and u = s r , inequality (4) is satisfied for any 
acceptable pair (a, <), as required. 

IF part: Remark first that if (4) is satisfied for some u then it also holds for 
any uv, v G *4*. Also, if a G" Alph(w) then (4) is trivially satisfied, allowing us 
to limit our attention to acceptable pairs (a, <) with a G Alph(u>). 

Let x := u\, the first letter of u. The proof will proceed by induction on 
I = \w\. If w is a letter, then w is clearly finite episturmian, i.e., the initial 
case I it; I = 1 is trivially true. 

We now distinguish two cases according to whether or not w begins with x. 
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Case 1: w begins with x. Suppose for instance w does not end with x (the other 
case is similar). Then, by Lemma 4.5, w = tp x (w') for some word w' that does 
not end with x. Further, it follows from Lemma 4.6 that, for any acceptable 
pair (a, <), min(w) = ifj x (mm(w')) if x = a (resp. mm(w) = x^-i/^minfW)) if 
x a). For short, let m, m! denote the respective words min(«;), min(u/). The 
induction step will consist in constructing some word v! such that inequality 
(4) holds for w', v! . 

For any acceptable pair n = (a, <) with a £ Alph(u>), let h = h{ir) be 
maximal such that auh is a prefix of m, and let H be the largest h(n) for all 
such pairs it. As uh £ F(w) and begins with x, we have uh = i^x{v) for some 
word v . 

Now consider an acceptable pair n = (a, <) as above with h < H. If auh = rn 
then we see that av q = m' for some q. Otherwise there exist letters, y < z 
such that au h+1 = au h y and m h+2 = auh.z; whence easily av g+ i = av q y and 
m' q+2 = av q z, and thus av\ m i\-\ < m'. Now, for any pair (a, <) such that h = H 
we have either aun = m or aun+i = auny < rriH+2 = rriH+iZ, for some letters 
y < z\ whence av = m! or avy < m. 

Consequently we can take either u' = v or u' — vy. This is the induction 
step. Clearly \w'\ =£'<£= \w\ unless |Alph(w)| = 1, a trivial case. 

Case 2: w does not begin with x. In this case, we have w = x~ 1 ifj x (w') for some 
word w' that does not begin with x. Consider W = xw = ip x (w'). Then, for any 
acceptable pair (a, <) with a ^ i, we have easily min(W / ) = min(w). The same 
holds if a = x and aa occurs in w because in this case min(W / ) begins with aa 
and W begins with ay for some y ^ x; thus min(W / ) £ F(w). If x = a and 
xx $l F(w), then the letter x does not occur in w', so inequality (4) is trivially 
satisfied for w' (as stated previously). Thus we can use W = xw instead of 
w for performing the induction step as in Case 1, ignoring acceptable pairs of 
the form (x, <). However, as \W\ = \w\ + 1, it is possible that \w'\ = \w\ or 
\w'\ — \w\ + 1, which are trivial cases corresponding to words w' of the form 
yx p for some letter y ^ x and p £ N. □ 



Example 4.8 Recall the finite word w = baabacababac from Example 4.2. For 
the different orders on {a, b, c}, we have 

• a < b < c or a < c < b: mm(w) = aabacababac, 

• b < a < c or b < c < a: min(tw) = babac, 

• c < a < b or c < b < a: mm(w) = cababac. 

It can be verified that a finite word u satisfying (4) must begin with aba and 
one possibility is u — abacaaaaaa; thus w is a finite episturmian word. 

Note. In the above example, any two acceptable pairs involving the same letter 
give the same min(io), which is not the case in general. 
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A corollary of Theorem 4.4 is the following new characterization of finite 
Sturmian words (i.e., finite balanced words). 

Corollary 4.9 A finite word w on A = {a,b}, a < b, is not Sturmian (in 
other words, not balanced) if and only if there exists a finite word u such that 
aua is a prefix of mm(w) and bub is a prefix o/max(w). □ 

Example 4.10 For w = ababaabaabab, min(io) = aabaabab and max(w) = 
babaabaabab. The longest common prefix of a~ l mm(w) and 6" 1 max(w) is 
abaaba, which is followed by b in mm(w) and a in max(w). Thus w is Sturmian. 
However, if we take w = aabababaabaab for instance, then w is not Sturmian 
since mm(w) = auab and max(u>) = bubaabaab where u = aba. 

Remark 4.11 An unrelated connection between finite balanced words (i.e., fi- 
nite Sturmian words) and lexicographic ordering was recently studied by Jenk- 
inson and Zamboni [11], who presented three new characterizations of 'cycli- 
cally' balanced finite words via orderings. Their characterizations are based 
on the ordering of a shift orbit, either lexicographically or with respect to the 
1-norm, which counts the number of occurrences of the symbol 1 in a given 
finite word over {0, 1}. 

5 A characterization of infinite episturmian words in a wide sense 

In this last section, we characterize by lexicographic order the set of all 
infinite words whose factors are (finite) episturmian. Such infinite words are 
exactly the episturmian and episkew words, as shown in Proposition 5.2 below. 

Definition 5.1 An infinite word is said to be episturmian in the wide 
sense if all of its factors are (finite) episturmian. 

We have the following easy result: 

Proposition 5.2 An infinite word is episturmian in the wide sense if and 
only if it is episturmian or episkew. 

PROOF. Let t be an infinite word. First suppose that t is episturmian in 
the wide sense. Clearly, if t is recurrent, then t is episturmian (cf. proof of (i) 
=>• (ii) in Proposition 3.1). On the other hand, if t is non- recurrent, then t is 
episkew, by Proposition 3.1. 

Conversely, if t is episturmian or episkew, then all of its factors are (finite) 
episturmian, and hence t is episturmian in the wide sense. □ 

Remark 5.3 Recall that in the 2-letter case the balanced infinite words (all 
of whose factors are finite Sturmian) are precisely the Sturmian and skew 
infinite words. As such, 'episturmian words in the wide sense' can be viewed 
as a natural generalization of balanced infinite words to an arbitrary finite 
alphabet. 
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As a consequence of Theorem 4.4, we obtain the following characterization 
of episturmian words in the wide sense (episturmian and episkew words). 

Corollary 5.4 An infinite word t on A is episturmian in the wide sense if 
and only if there exists an infinite word u such that 

au < min(t) (5) 

for any acceptable pair (a, <). 

PROOF. IF part: Inequality (5) holds. So, for any factor w of t and any 
acceptable pair (a, <), we have 

aui m i_i < m where m = min(w). 

Therefore, by Theorem 4.4, w is a finite episturmian word; whence t is epis- 
turmian in the wide sense since any factor of t is (finite) episturmian. 

ONLY IF part: t is episturmian in the wide sense, so all of its factors are (finite) 
episturmian; in particular, any prefix t q of t is finite episturmian. Therefore, by 
Theorem 4.4, there exists a finite word, say u(q), such that, for any acceptable 
pair (a, <), we have 

a ^(<?)|m( g )|-i < min(t, 3 ) where m(q) = min(t g ). 

On the other hand, for any k G N there exists a number r(k) G N such 
that, for any q > r(k), t q contains all the min(t|/c) as factors for all acceptable 
pairs (a, <). It follows then that min(t|fc) is a prefix of min(t 9 ); in particular 
| min(t g )| > k, and hence \u(q)\ > k — 1. Thus, the \u(q)\ are unbounded. 

Let us denote by u a limit point of the u(q). Then, for any n, infinitely many 
u(q) have u n as a prefix. 

Now, for any given k G N + and acceptable pair (a, <), there exists a q (as 
above) such that 

au fc _i = au(q) k -i < min(t 9 ) fc = min(t|/c). 
Thus au < min(t). □ 

In the 2-letter case, we have the following characterization of balanced infinite 
words; in other words, all Sturmian and skew infinite words. 

Corollary 5.5 An infinite wordt on {a, b}, a < b, is balanced (i.e., Sturmian 
or skew) if and only if there exists an infinite word u such that 

au < min(t) < max(t) < bu. □ 

Remark 5.6 A variation of the above result appears, under a different guise, 
in a paper by S. Gan [7, Lemma 4.4]. 
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